Mel'nikov Boris Feliksovich, Doctor of physical and mathematical sciences, professor, sub-department of information systems and networks, Russian State Social University (4 Wilgelma Pika street, Moscow, Russia), firstname.lastname@example.org
Trenina Marina Anatol'evna, Senior lecturer, sub-department of applied mathematics and informatics, Institute of mathematics, physics and information technologies, Togliatti State University (14 Belorusskaya street, Togliatti, Russia), email@example.com
Kochergin Aleksandr Sergeevich, Postgraduate student, Russian State Social University (4 Wilgelma Pika street, Moscow, Russia), firstname.lastname@example.org
Background. In practice, it is often necessary to calculate the distance between sequences of a different nature. Similar algorithms are used in bioinformatics to compare sequenced genetic chains. Because of the large dimensionality of such chains, we have to use heuristic algorithms that give approximate results. Therefore, the problem arises of estimating the quality of the metrics (distances) used, which, according to its results, one can conclude that the algorithm is applicable to various studies. Purpose of the study – improving the quality of the distance between thee long strings.
Materials and methods. To compare the genetic chains taken from the open bank NCBI, we offer a heuristic algorithm developed on the basis of the Needleman – Wunsch algorithm. After implementing this algorithm, a special function with three parameters to the obtained metric values is applied, which are determined by the method of gradient descent.
Results. A qualitative evaluation of the operation of algorithms for calculating the distance between DNA strings was obtained. An approach to the improvement of such algorithms was developed.
Conclusions. It was proposed to improve the Needleman – Wunsch algorithm for comparison of string sequences, and also formulate an approach to improving other algorithms for constructing metrics on long lines.
1. Ayala F., Kayger Dzh. Sovremennaya genetika: per. s angl.: v 3 t. [Modern genetics: translation from English: in 3 volumes]. Moscow: Mir, 1987, vol. 1, 295 p.
2. Hamming R. W. The Bell System Technical Journal. 1950, vol. 29 (2), pp. 147–160.
3. Levenshteyn V. I. Doklady Akademii nauk SSSR [Reports of the USSR Academy of Sciences]. 1965, vol. 163.4, pp. 845–848.
4. Kormen T. Leyzerson Ch., Rivest R., Shtayn K. Algoritmy. Postroenie i analiz [Algorithms. Building and analysis]. Moscow: Vil'yams, 2005, 1296 p.
5. Needleman S., Wunsch C. Journal of Molecular Biology. 1970, vol. 48 (3), pp. 443–453.
6. Smith T. F., Waterman M. S. Journal of Molecular Biology. 1981, vol. 147, pp. 195–197.
7. Hromkovic J. Algorithmics for Hard Problems. Introduction to Combinatorial Optimization, Randomization, Approximation, and Heuristics. Germany: Springer, 2003, 538 p.
8. Mel'nikov B., Romanov N. Teoreticheskie problemy informatiki i ee prilozheniy [Theoretical problems of informatics and its applications]. 2001, vol. 4, pp. 81–87.
9. Melnikov B., Radionov A., Moseev A., Melnikova E. ICSOFT, Technologies, Proceedings 1st International Conference on Software and Data Technologies. Germany: Springer, 2007, pp. 272–279.
10. Mel'nikov B. F., Panin A. G. Vektor nauki Tol'yattinskogo gosudarstvennogo universiteta [The scientific vector of Togliatti State University]. 2012, no. 4 (22), pp. 83–86.
11. Akho A., Ul'man Dzh. Teoriya sintaksicheskogo analiza, perevoda i kompilyatsii. Sintaksicheskiy analiz [The theory of syntactical analysis, translation and compilation]. Moscow: Kniga po Trebovaniyu, 2012, vol. 1, 613 p.
12. Home – Nucleotide – NCBI. Available at: https://www.ncbi.nlm.nih.gov/nuccore
13. Pages H., Aboyoun P., Gentleman R., DebRaoy S. Biostrings: String Objects Representing Biological Sequences and Matching Algorithms. Bioconductor. Available at: https://bioc.ism.ac.jp/packages/2.6/bioc/html/Biostrings.html
14. Pairwise_Alignment_Mammals. Available at: https://yadi.sk/d/Oa-DTAC83SCzRq
15. Frans B. M. Bonobo: The Forgotten Ape. University of California Press, 1998, 224 p.
16. Melnikov B. F., Pivneva S. V., Trifonov M. A. Informatsionnye tekhnologii i nanotekhnologii (ITNT-2017): sb. tr. III Mezhdunar. konf. i molodezhnoy shkoly [Information technologies and nanotechnologies (ITNT-2017): proceedings of III International conference and youth school]. Samarskiy natsional'nyy issledovatel'skiy universitet imeni akademika S. P. Koroleva. Samara, 2017, pp. 1640–1645.
17. Kolmogorov A. N., Fomin S. V. Elementy teorii funktsiy i funktsional'nogo analiza [Elements of the theory of fuctions and functional analysis]. MGU im. M. V. Lomonosova. Ed. 7th. Moscow: Fizmatlit, 2004, 570 p.